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. Abstract 

We study the limitations of steganography when the sender is not using any properties of the 
C/3 , underlying channel beyond its entropy and the ability to sample from it. On the negative side, 

, ' we show that the number of samples the sender must obtain from the channel is exponential in 

the rate of the stegosystem. On the positive side, we present the first secret-key stegosystem 
that essentially matches this lower bound regardless of the entropy of the underlying chan- 
^ I nel. Furthermore, for high-entropy channels, we present the first secret-key stegosystem that 

matches this lower bound statelessly (i.e., without requiring synchronized state between sender 
and receiver). 



Keywords, steganography, covert communication, rejection sampling, lower bound, pseudo- 
' randomnness, information hiding, huge random objects. 
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' 1 Introduction 



^ i Steganography's goal is to conceal the presence of a secret message within an innocuous-looking 

^ ' communication. In other words, steganography consists of hiding a secret hiddentext message within 

a public covertext to obtain a stegotext in such a way that an unauthorized observer is unable to 
distinguish between a covertext with a hiddentext and one without. 

The first rigorous complexity-theoretic formulation of secret-key steganography was provided by 
Hopper, Langford and von Ahn ^llj. In this formulation, steganographic secrecy of a stegosystem is 
defined as the inability of a polynomial-time adversary to distinguish between observed distributions 
of unaltered covertexts and stegotexts. (This is in contrast with many previous works, which tended 
to be information-theoretic in perspective; see, e.g., [5] and other references in [Hill].) 

1.1 Model 

In steganography, the very presence of a message must be hidden from the adversary, who must be 
given no reason for suspecting that anything is unusual. This is the main difference from encryption, 



'Preliminary version appears in TCC 2005 [5]. 
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which does not prevent the adversary from suspecting that a secret message is being sent, but only 
from decoding the message. To formalize "unusual," some notion of usual communication must 
exist. 

We adopt the model of [llj with minor changes. In it, sender sends data to receiver. The usual 
(nonsteganographic) communication comes from the channel, which is a distribution of possible 
documents sent from sender to receiver based on past communication. The channel models the 
sender's decision process about what to say next in ordinary communication; thus, the sender is 
given access to the channel via a sampling oracle that takes the past communication as input and 
returns the next document from the appropriate probability distribution. Sender and receiver share 
a secret key (public-key steganography is addressed in [I8l [l] ) . 

The adversary is assumed to also have some information about the usual communication, and 
thus about the channel. It listens to the communication and tries to distinguish the case where 
the sender and receiver are just carrying on the usual conversation (equivalently, sender is honestly 
sampling from the oracle) from the case where the sender is transmitting a hiddentext message 
m G {0, 1}* (the message may even be chosen by the adversary). A stegosystem is secure if the 
adversary's suspicion is not aroused — i.e., if the two cases cannot be distinguished. 

1.2 Desirable Characteristics of a Stegosystem 

Black-Box. In order to obtain a stegosystem of broad applicability, one would like to make as 
few assumptions as possible about the understanding of the underlying channel. As Hopper et 
al. [lij point out, the channel may be very complex and not easily described. For example, if the 
parties are using photographs of city scenes as covertexts, it is reasonable to assume that the sender 
can obtain such photographs, but unreasonable to expect the sender and the receiver to know a 
polynomial-time algorithm that can construct such photographs from uniformly distributed random 
strings. We therefore concentrate on black-box steganography, in which the knowledge about the 
channel is limited to the sender's ability to query the sampling oracle and a bound on the channel's 
min-entropy available to sender and receiver. In particular, the receiver is not assumed to be able 
to sample from the channel. The adversary, of course, may know more about the channel. 

Efficient (in terms of running time, number of samples, rate, reliability). The running 
times of sender's and receiver's algorithms should be minimized. Affairs are slightly complicated 
by the sender's algorithm, which involves two kinds of fundamentally different operations: compu- 
tation, and channel sampling. Because obtaining a channel sample could conceivably be of much 
higher cost than performing a computation step, the two should be separately accounted for. 

Transmission rate of a stegosystem is the number of hiddentext bits transmitted per single ste- 
gotext document sent. Transmission rate is tied to reliability, which is the probability of successful 
decoding of an encoded message (and unreliability, which is one minus reliability). The goal is to 
construct stegosystems that are reliable and transmit at a high rate (it is easier to transmit at a 
high rate if reliability is low and so the receiver will not understand much of what is transmitted). 

Even if a stegosystem is black-box, its efficiency may depend on the channel distribution. We 
will be interested in the dependence on the channel min-entropy h. Ideally, a stegosystem would 
work well even for low-min-entropy channels. 

Secure. Insecurity is defined as the adversary's advantage in distinguishing stegotext from regular 
channel communication (and security as one minus insecurity). Note that security, like efficiency. 
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may depend on the channel min-entropy. We are interested in stegosystems with insecurity as close 
to as possible, ideally even for low-min-entropy channels. 

Stateless. It is desirable to construct stateless stegosystems, so that the sender and the receiver 
need not maintain synchronized state in order to communicate long messages. Indeed, the need for 
synchrony may present a particular problem in steganography in case messages between sender and 
receiver are dropped or arrive out of order. Unlike in counter-mode symmetric encryption, where 
the counter value can be sent along with the ciphertext in the clear, here this is not possible: the 
counter itself would also have to be steganographically encoded to avoid detection, which brings us 
back to the original problem of steganographically encoding multibit messages. 

1.3 Our Contributions 

We study the optimal efficiency achievable by black-box steganography, and present secret-key 
stegosystems that are nearly optimal. Specifically, we demonstrate the following results: 

• A lower bound, which states that a secure and reliable black-box stegosystem with rate of w 
bits per document sent requires the encoder to take at least c2"' samples from the channel 
per w bits sent, for some constant c. The value of c depends on security and reliability, and 
tends to 1/ (2e) as security and reliability approach 1 . This lower bound applies to secret-key 
as well as public-key stegosystems. 

• A stateful black-box secret-key stegosystem STF that transmits w bits per document sent, 
takes 2^ samples per w bits, and has unreliability of 2~^~^^ per document (recall that h is 
the channel entropy) and negligible insecurity, which is independent of the channel. (A very 
similar construction was independently discovered by Hopper [121 Construction 6.10].) 

• A stateless black-box secret-key stegosystem STL that transmits w bits per document sent, 
takes 2'^ samples per w bits, and has unreliability 2"®^^'-* and insecurity negligibly close to 

l22-h+2w £qj. i^j^g gg^£_ 

Note that for both stegosystems, the rate vs. number of samples tradeoff is very close to the 
lower bound — in fact, for channels with sufficient entropy, the optimal rate allowed by the lower 
bound and the achieved rate differ by log2 2e < 2.5 bits (and some of that seems due to slack in the 
bound). Thus, our bound is quite tight, and our stegosystems quite efficient. The proof of the lower 
bound involves a surprising application of the huge random objects of [8], specifically of the truthful 
implementation of a boolean function with interval-sum queries. The lower bound demonstrates 
that significant improvements in stegosystem performance must come from assumptions about the 
channel. 

The stateless stegosystem STL can be used whenever the underlying channel distribution has 
sufficient min-entropy h for the insecurity p2~^~^'^^ to be acceptably low. It is extremely simple, 
requiring just evaluations of a pseudorandom function for encoding and decoding, and very reliable. 

If the underlying channel does not have sufficient min-entropy, then the stateful stegosystem 
STF can be used, because its insecurity is independent of the channel. While it requires shared 
synchronized state between sender and receiver, the state information is only a counter of the 
number of documents sent so far. If min-entropy of the channel is so low that unreliability of 
2-h+w pgj, document is too high for the application, reliability of this stegosystem can be improved 
through the use of error-correcting codes over the 2'^-ary alphabet (applied to the hiddentext 
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before stegoencoding) , because failure to decode correctly is independent for each w-h'it block. 
Error-correcting codes can increase reliability to be negligibly close to 1 at the expense of reducing 
the asymptotic rate from w to w — {h-\-2)2~^^'^ . Finally, of course, the min-entropy of any channel 
can be improved from h to nh by viewing n consecutive samples as a single draw from the channel; 
if h is extremely small to begin with, this will be more efficient than using error-correcting codes 
(this improvement requires both parties to be synchronized modulo n, which is not a problem in 
the stateful case). 

This stateful stegosystem STF also admits a few variants. First, the logarithmic amount of 
shared state can be eliminated at the expense of adding a linear amount of private state to the 
sender and reducing reliability slightly (as further described in 14. Ih . thus removing the need for 
synchronization between the sender and the receiver. Second, under additional assumptions about 
the channel (e.g., if each document includes time sent, or has a sequence number), STF can be 
made completely stateless. The remarks of this paragraph and the previous one can be equally 
applied to [121 Construction 6.10]. 

1.4 Related Work 

The bibliography on the subject of steganography is extensive; we do not review it all here, but 
rather recommend references in [11|. 

Constructions. In addition to introducing the complexity-theoretic model for steganography, 
P,lJ proposed two constructions of black-bo^j secret-key stegosystems, called Construction 1 and 
Construction 2. 

Construction 1 is stateful and, like our stateful construction STF, boasts negligible insecurity 
regardless of the channel. However, it can transmit only 1 bit per document, and its reliability is 
limited by 1/2 + 1/4(1 — 2~'^) per document sent, which means that, regardless of the channel, each 
hiddentext bit has probability at least 1/4 of arriving incorrectly (thus, to achieve high reliability, 
error-correcting codes with expansion factor of at least 1/(1 — i72(l/4)) ~ 5 are needed). In contrast, 
STF has reliability that is exponentially (in the min-entropy) close to 1, and thus works well for 
any channel with sufficient entropy. Furthermore, it can transmit at rate w for any w < h, provided 
that the encoder has sufficient time for the 2"' samples required. It can be seen as a generalization 
of Construction 1. 

Construction 2 of [11] is stateless. Like the security of our stateless construction STL, its security 
depends on the min-entropy of the underlying channel. While no exact analysis is provided in |llj . 
the insecurity of Construction 2 seems to be roughly \/l2'^~^^'^^/'^ (due to the fact that the adversary 
sees I samples either from C or from a known distribution with bias roug hly 2(-'*+'")/2 caused by 
a public extractor; see Appendix E]), which is higher than the insecurity of STL (unless / and w 
are so high that h < Sty + 3 log/, in which case both constructions are essentially insecure, because 
insecurity is higher than the inverse of the encoder's running time 12'^). Reliability of Construction 
2, while not analyzed in [11], seems close to the reliability of STL. The rate of Construction 2 is 
lower (if other parameters are kept the same), due to the need for randomized encryption of the 
hiddentext, which necessarily expands the number of bits sent. 

It is important to note that the novelty of STL is not the construction itself, but rather its 
analysis. Specifically, its stateful variant appeared as Construction 1 in the Extended Abstract of 

^Construction 2, which, strictly speaking, is not presented as a black-box construction in ,11 , can be made 
black-box through the use of extractors (such as universal hash functions) in place of unbiased functions, as shown 
in [18]. 
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[TT] . but the analysis of the Extended Abstract was later found to be flawed by [13]. Thus, the full 
version of [H] included a different Construction 1. We simply revive this old construction, make it 
stateless, generalize it to w bits per document, and, most importantly, provide a new analysis for 
it. 

In addition to the two constructions of [llj described above, and independently of our work. Hop- 
per [T^ proposed two more constructions: Constructions 6.10 (MultiBlock) and 3.15 (NoState). 
As already mentioned, MultiBlock is essentially the same as our STF. NoState is an interesting 
variation of Construction 1 of [TT] that addresses the problem of maintaining shared state at the 
expense of lowering the rate even further. 

Bounds on the Rate and Efficiency. Hopper in [12| Section 6.2] establishes a bound on the 
rate vs. efficiency tradeoff. Though quantitatively similar to ours (in fact, tighter by the constant of 
2e), this bound applies only to a restricted class of black-box stegosystems: essentially, stegosystems 
that encode and decode one block at a time and sample a fixed number of documents per block. 
The bound presented in this paper applies to any black-box stegosystem, as long as it works for a 
certain reasonable class of channels, and thus can be seen as a generalization of the bound of [12] . 
Our proof techniques are quite different than those of [12j, and we hope they may be of independent 
interest. We refer the reader to Section 13.41 for an elaboration. Finally it should be noted that 
non-black-box stegosystems can be much more efficient — see [I H 118 ^ [T4 t I15j. 

2 Definitions 

2.1 Steganography 

The definitions here are essentially those of [TT]. We modify them in three ways. First, we view the 
channel as producing documents (symbols in some, possibly very large, alphabet) rather than bits. 
This simplifies notation and makes min-entropy of the channel more explicit. Second, we consider 
stegosystem reliability as a parameter rather than a fixed value. Third, we make the length of the 
adversary's description (and the adversary's dependence on the channel) explicit in the definition. 

The Channel. Let S be an alphabet; we call the elements of S documents. A channel C is 
a map that takes a history S S* as input and produces a probability distribution D-^ E 
S. A history TC = siS2---Sn is legal if each subsequent symbol is obtainable given the previ- 
ous ones, i.e., PrDs-^s2...si_i[^i] > 0- Min-entropy of a distribution D is defined as Hoo{D) = 
mms£D{— log2 P^dIs]}- Min-entropy of C is the min-;^ HaoiD-j-i), where the minimum is taken over 
legal histories TL. 

Our stegosystems will make use of a channel sampling oracle M, which, on input 7i, outputs a 
symbol s according to D-yi. A stegosystem may be designed for a particular S and min-entropy of 
C. 

Definition 1. A black-box secret-key stegosystem for the alphabet S is a pair of probabilistic 
polynomial time algorithms ST = {SE, SD) such that, for a security parameter k, 

1. SE has access to a channel sampling oracle M for a channel C on S and takes as input 
a randomly chosen key K S {0,1}", a string m G {0,1}* (called the hiddentext) , and the 
channel history 7i. It returns a string of symbols S1S2 . . . s/ G S* (called the stegotext) 
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2. SD takes as input a key K G {0, 1}'', a stegotext siS2 ■ ■ ■ si S E*, and a channel history Ti. 
and returns a hiddentext m £ {0, 1}*. 

We further assume that the length / of the stegotext output by SE depends only on the length of 
hiddentext m but not on its contents. 

Stegosystem Reliability. The reliability of a stegosystem ST with security parameter k for a 
channel C and messages of length // is defined as 

^elsT(^)Cu= min { Pr \SD(K,SE^^(K,m,n),n)=m]}. 
Unreliability is defined as UnRel52-(^) = 1 — Rel^y^^) ^./i- 

The Adversary. We consider only passive adversaries who mount a chosen hiddentext attack on 
ST (stronger adversarial models for steganography have also been considered, see, e.g., [HI HSl [I] ) . 
The goal of such an adversary is to distinguish whether it is seeing encodings of the hiddentext it 
supplied to the encoder or simply random draws from the channel. To this end, define an oracle 
0{-,TC) that produces random draws from the channel starting with history TC as follows: on input 
m G {0,1}*, O computes the length I of the stegotext that SE'^'^ {K,m) would have output and 
outputs siS2 ■ ■ ■ si where each is drawn according to -D-HosiS2...Si-i • 

Definition 2. is a (t, d, q, A) passive adversary for stegosystem ST if 

1. W runs in expected time t (including the running time needed by the stegoencoder to answer 
its queries) and has description of length d (in some canonical language) . 

2. W has access to C via the sampling oracle M(-). 

3. W can make an expected number of q queries of combined length A bits to an oracle which 
is either SE^{K,-,-) or 0(-,-)- 

4. W outputs a bit indicating whether it was interacting with SE or with O. 

Stegosystem Security. The advantage Adv^^ (here SS stands for "Steganographic Secrecy") 
of W against ST with security parameter n for a channel C is defined as 



Pr \W^I,SE'Hk,;-) = 1] _ p^r^M,0(v) 
K^{0,1}'- 



ST{k.),C\ 

For a given {t, d, q, A), the insecurity of a stegosystem ST with respect to channel C is defined as 

InSecf^( . d, q, A) = max {Adv|S . ^(W^)} ' 

{t,d,q,x) adversary w 

and security Sec as 1 — InSec. 

Note that the adversary's algorithm can depend on the channel C, subject to the restriction 
on the algorithm's total length d. In other words, the adversary can possess some description of 
the channel in addition to the black-box access provided by the channel oracle. This is a mean- 
ingful strengthening of the adversary: indeed, it seems imprudent to assume that the adversary's 
knowledge of the channel is limited to whatever is obtainable by black-box queries (for instance, 
the adversary has some idea of a reasonable email message or photograph should look like) . It does 
not contradict our focus on black-box steganography: it is prudent for the honest parties to avoid 
relying on particular properties of the channel, while it is perfectly sensible for the adversary, in 
trying to break the stegosystem, to take advantage of whatever information about the channel is 
available. 
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2.2 Pseudorandom Functions 

We use pseudorandom functions [7] as a tool. Because the adversary in our setting has access to the 
channel, any cryptographic tool used must be secure even given the information provided by the 
channel. Thus, the underlying assumption for our constructions is the existence of pseudorandom 
functions that are secure given the channel oracle, which is equivalent [9J to the existence of one- 
way functions that are secure given the channel oracle. This is the minimal assumption needed for 
steganography [TT] . 

Let T = {-Fseed}seede{o,i}* be a family of functions, all with the same domain and range. For a 
probabilistic adversary and channel C with sampling oracle M, the PRF-advantage of A over T 
is defined as 

Pr [^Af,F,,,d{-) = 1] _ Pr[^JV^.9{-) = 1] ^ 

soed^{0,l}" 9 

where is a random function with the same domain and range. For a given (t,d,q), the insecurity 
of a pseudorandom function family with respect to channel C is defined as 

InSecPf^4c(*'^'9) = max {Adv^? 

{t,d,q) adversary A 

where the maximum is taken over all adversaries that run in expected time t, whose description 
size is at most d, and that make an expected number of q queries to their oracles. 

The existence of pseudorandom functions is also the underlying assumption for our lower bound; 
however, for the lower bound, we do not need to give the adversary access to a channel oracle 
(because we construct the channel). To distinguish this weaker assumption, we will omit the 
subscript C from InSec. 

3 The Lower Bound 

Recall that we define the rate of a stegosystem as the average number of hiddentext bits per document 
sent (this should not be confused with the average number of hiddentext bits per bit sent; note also 
that this is the sender's rate, not the rate of information actually decoded by the receiver, which is 
lower due to unreliability). We set out to prove that a reliable stegosystem with black-box access 
to the channel with rate w must make roughly 12'^ queries to the channel to send a message of 
length Iw. Intuitively, this should be true because each document carries w bits of information on 
average, but since the encoder knows nothing about the channel, it must keep on sampling until it 
gets the encoding of those vu bits, which amounts to 2^ samples on average. 

In particular, for the purposes of this lower bound it suffices to consider a restricted class of 
channels: the distribution of the sample depends only on the length of the history (not on its 
contents). We will write Di, D2, Di, instead of D-j-i, where i is the length of the history Ti.. 
Furthermore, it will suffice for us to consider only distributions Di that are uniform on a subset of 
E. We will use the notation Di both for the distribution and for the subset (as is often done for 
uniform distributions) . 

Let H denote the number of elements of Di (note that H = \Di\ = 2^), and let S = Because 
the encoder knows the min-entropy h of the channel, ii H = S, then the encoder knows the channel 
completely (it is simply uniform on S). Therefore, if H = S, then there is no meaningful lower 
bound on the number of queries made by the encoder to the channel oracle, because it does not 
need to make any queries in order to sample from the channel. Thus, we require that H < S (our 
bounds will depend slightly on the ratio of S" to S" — H). 



Adv^^„\,(^) = 
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Our proof proceeds in two parts. First, we consider a stegoencoder SE that does not output 
anything that it did not receive as a response from the channel-samphng oracle (intuitively, every 
good stegoencoder should work this way, because otherwise it may output something that is not in 
the channel, and thus be detected). To be reliable — that is, to find a set of documents that decode 
to the desired message — such an encoder has to make many queries, as shown in Lemma[TJ Second, 
we formalize the intuition that a good stegoencoder should output only documents it received from 
the channel-sampling oracle: we show that to be secure (i.e., not output something easily detectable 
by the adversary), a black-box SE cannot output anything it did not receive from the oracle: if it 
does, it has an 1 — H/ S chance of being detected. 

The second half of the proof is somewhat complicated by the fact that we want to assume 
security only against bounded adversaries: namely, ones whose description size and running time 
are polynomial in the description size and running time of the encoder (in particular, polynomial 
in log 5 rather than S). Thus, the adversary cannot be detecting a bad stegoenconder by simply 
carrying a list of all the entries in Di for each i and checking if the ith document sent by the 
stegoencoder is in Di, because that would make the adversary's description too long. 

This requires us to come up with pseudorandom subsets Di of E that have concise descriptions 
and high min-entropy and whose membership is impossible for the stegoencoder to predict. In 
order to do that, we utilize techniques from the truthful implementation of a boolean function 
with interval-sum queries of [8] (truthfulness is important, because min-entropy has to be high 
unconditionally) . 



3.1 Lower Bound When Only Query Results Are Output 

If Di, D2, ■ ■ ■ are subsets of S, then we write D = Di x D2 x ... to denote the channel that, on 
history length i, outputs a uniformly random element of Di. If = ID2I = . . . = 2'* then we say 
that D is a flat h-channel. We will consider flat /i-channels. 

Normally, one would think of the channel sampling oracle for D as making a fresh random choice 
from Di when queried on history length i. However, from the point of view of the stegoencoder, it 
does not matter if the choice was made by the oracle in response to the query, or before the query 
was even made. It will be easier for us to think of the oracle as having already made and written 
down countably many samples from each Di. We will denote the jth sample from Di by Sij. Thus, 
suppose that the oracle has already chosen 

si,i, si,2,--- ,sij,... from Di, 
S2,i, S2,2,--- ,S2,j,--- from D2, 

• • • 1 

Si,!-, Si,2, ■ ■ ■ ,Sij, . . . from A, 

We will denote the string containing all these samples by S and refer to it as a draw sequence from 
the channel. We will give our stegoencoder access to an oracle (also denoted by S) that, each time 
it is queried with i, returns the next symbol from the sequence Sj^i, Si,2, ■ ■ ■ , Sij, .... Choosing S at 
random and giving the stegoencoder access to it is equivalent to giving the encoder access to the 
usual channel-sampling oracle M for our channel D. 

Denote the stegoencoder's output by SE'^ {K,m,H) = t = tit2...ti, where ti G S. Because 
we assume in this section that the stegoencoder outputs only documents it got from the channel 
oracle, ti is an element of the sequence Sj^i, Si,2, ■ ■ ■ , Sij, .... If ti is the jth element of this sequence, 
then it took j queries to produce it. We will denote by weight oft with respect to S the number 



8 



of queries it took to produce t: W{t,S) = X]i=i I ~ ^i}- next lemma, we prove 

(by looking at the decoder) that for any S most messages have high weight, i.e., must take many 
queries to encode. 

Lemma 1. Let F : S* — > {0, 1}* be an arbitrary (possibly unbounded) deterministic stegodecoder 
that takes a sequence t G and outputs a message m of length Iw bits. 

Then the probability that a random Iw-bit message has an encoding of weight significantly less 
than {l/e)l2'^ is small. More precisely, for any S € S** and any N G N.' 

Proof. Simple combinatorics show that the number of different sequences t that have weight at 
most (and hence the number of messages that have encodings of weight at most N) is at most 
(J^^ : indeed, it is simply the number of positive integer solutions to ji + . . . + ji < A^, which is 
the number of ways to put / bars among — / stars (the number of stars to the right of the ith 
bar corresponds to ji — 1), or, equivalently, the number of ways choose I positions out of N. The 
total number of messages is 2'^". The last inequality follows from (^) < ("7^)' (which is a standard 
combinatorics fact and follows from kl > (k/e)^, which in turn follows by induction on k from 
e > (1 + 1/A:)'=). □ 

Our lower bound applies when a stegosystem is used to encode messages drawn uniformly from 
bit strings of equal length. It can easily be extended to messages drawn from a uniform distribution 
on any set. If the messages are not drawn from a uniform distribution, then, in principle, they can 
be compressed before transmission, thus requiring less work on the part of the stegoencoder. We 
do not provide a lower bound in such a case, because any such lower bound would depend on the 
compressibility of the message source. 

3.2 Secure Stegosystems Almost Always Output Query Answers 

The next step is to prove that the encoder of a secure black-box stegosystem must output only 
what it gets from the oracle, or else it has a high probability of outputting something not in the 
channel. Assume that D is a flat /i-channel chosen uniformly at random. For t = ti . . .ti £ Y,* , 
let t G D denote that ti is in Di for each i. In the following lemma, we demonstrate that, if the 
encoder's output t contains a document that it did not receive as a response to a query, the chances 
that t G D are at most H/S. 

Before stating the lemma, we define the set E of all possible flat /i-channels and draw sequences 
consistent with them: E = {{D,S) \ Sij G D^}. We will be taking probabilities over E. Strictly 
speaking, E is an infinite set, because we defined D to be countable and S to have countably 
many samples from each Di. For clarity, it may be easiest to think of truncating these countable 
sequences to a sufficiently large value beyond which no stegoencoder will ever go, thus making E 
finite, and then use the uniform distribution on E. Formally, E can be defined as a product of 
countably many discrete probability spaces (see, e.g., [6, Section 9.6]), with uniform distribution 
on each. 

Lemma 2. Consider any deterministic procedure A that is given oracle access to a random flat 
h-channel D and outputs t = ^1^2 ■ . .ti G T,* (think of A as the stegoencoder running on some input 
key, message, channel history, and fixed randomness). Provided that h is sufficiently smaller than 
log S, if A outputs something it did not get from the oracle, then the probability t £ D is low. 
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More precisely, let Qi be the set of responses A received to its queries from the ith channel Di. 
Define the following two events: 

• nonqueried: Nq = {{D,S) G E \ {3i)ti ^ Qi} 

• in support: Ins = {(z3, S) e E \ t e 3} 
Then: 

Pr [Ins A Nq]< — . 
{D,S)eE ^ 

Proof. If A were always outputting just a single value {I = 1), the proof would be trivial: seeing 
some samples from a random Di does not help A come up with another value from and Di 
makes up only an H/S fraction of all possible outputs of A. The proof below is a generalization 
of this argument for I > 1, with care to avoid simply taking the union bound, which would get us 
IH/S instead of H/S. 

Let Nq^ = {{D,S) G £' | ti G Qi,t2 G Q2, ■ ■ ■ ,ti-i G Qi-i,ti ^ Qi} be the event ti is the first 
element of the output that was not returned by the oracle as an answer to a query. Observe that 
Ui ^Qi — ^1 t^sX Nq^ are disjoint events and, therefore, Pi'[-A'^?i] = 1- Now the probability 
we are interested in is 

Pr[/ns A Nq]=^ Pr[/ns A Nqi] = ^ Vi[Ins \ Nqi] Pr[A/"gj] . 
i i 

To bound Pr[Ins \ Nq^, fix any 

... , 

such that A^ asks exactly qi queries from Di , q2 queries from D2, .... Note that such S determines 
the behavior of A, including its output. Assume that, for this 5, the event Nq^ happens. We will take 
the probability Pr[/ns | Nq^ over a random D consistent with S (i.e., for which si^i, si^2, • • • £ 
El, S2,i, S2,2 ■ ■ ■ S2,q2 ^ D2, . . .). This probability can be computed simply as follows: if is the 
number of distinct elements in Sj^i, Sj_2, ■ • • , Si^q^, then there are (^_^*) equally likely choices for Di 
(because q!^ elements of Di are already determined). However, for Ins to happen, Dj must also 
contain ti, which is not among Sj^i, Si^2, ■ ■ ■ , Si,qi (because we assumed Nq^^ happens). The choices 
oi Di, . . . , -Di-i, -Dj_|_i, ... do not matter. Therefore, 



Fr[lns I Nq^] = = ^ < 



The above probability is for any fixed S of the right length and randomly chosen D consistent 
with 5. Therefore, it also holds for randomly chosen {D,S) G E, because the order in which S and 
D are chosen and the values in S beyond what A queries do not affect the probability. We thus 
have 

^Pr [Ins A Nq] = y2 Pr[/ns | Nqi] Pr[7V?,] < V ^ Pi[Nq,] = - . 
{D,S)GE ^ ^ S S 

□ 
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3.3 Lower Bound for Unbounded Adversary 

We now want to tie together Lemmas [T] and [2] to come up with a lower bound on the efficiency of 
the stegoencoder in terms of rate, reliabihty, and security. Note that some work is needed, because 
even though Lemma [1] is about rehability and Lemma [2] is about security, neither mentions the 
parameters Rel and InSec. 

Assume, for now, that the adversary can test whether tj is in the support of Dj. (This is not 
possible if Di is completely random and the adversary's description is small compared to S = \T,\; 
however, it serves as a useful warm-up for the next section.) Then, using Lemma [21 it is easily 
shown that, if the stegoencoder has insecurity e, then it cannot output something it did not receive 
as response to a query with probability higher than e/(l — H/S). This leads to the following 
theorem. 

Theorem 1. Let {SE, SD) he a black-box stegosystem with insecurity e against an adversary who 
has an oracle for testing membership in the support ofC, unreliability p and rate w for an alphabet S 
of size S. Then, for any positive integer H < S, there exists a channel with min-entropy h = log2 H 
such that the probability that the encoder makes at most N queries to send a random message of 
length Iw is at most 

and the expected number of queries per stegotext symbol is therefore at least 
where R = SI{S-H). 

Note that, like Lemma [H this theorem and Theorem [2] apply when a stegosystem is used to 
encode messages drawn uniformly from the distribution of all /w-bit messages (see remark following 
the proof of Lemma [T|) . 

Proof. We define the following events, which are ah subsets of Ex {0, 1}* x {0, 1}'"' x {0, 1}* (below 
V denotes the randomness of SE): 

• "S'i? makes few queries to encode m under K" : Few = {D,S, K,m,v I SE^{K, 
makes at most queries} (note that this is the event whose probability we want to bound) 

• "S'i? outputs a correct encoding of m under K": Corr = {Z), 5, K,m,v \ SD{K, SE'^{K, m; v)) 
= m} 

• "m has an encoding t under K, and this encoding has low weight" : Low = {D, S, K, m, v{3t) \ 
SD{K,t) = m A W{t,S) < N} 

• Ins and Nq as in Lemma El but as subsets of ^ x {0, 1}* x {0, l}'"" x {0, 1}* 

Suppose that SE outputs a correct encoding of a message m. In that case, the probability that it 
made at most N queries to the channel is upper bounded by the probability that: (i) there exists 
an encoding of m of weight at most N, or (ii) SE output something it did not query. In other 
words, 

Pr[Few I Corr] < Pr[Low \ Corr] + PT:[Nq \ Corr]. 
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Now we have 



Fi[Few] = Pr[Few D Corr] + Pr[Fe'u; n Corr] 
< Pr[Feu' n Corr] + Pr[ Corr] 



Pr[Few I Corr] ■ Pr[Corr] + Pr[ Corr] 



< (Pr[Low I Corr] + Pi[Nq \ Corr]) • Pr[ Corr] + Pr[Corr] 
= Pr [Low n Corr] + Pr [Nq n Corr] + Pr[ Corr] 

< Pt[Low] + Pr[Nq] + Pr[ Corr] . 

Because insecurity is e, Pr[/ns] < e. Hence, 

^ Pr[fa^n Nq] ^ Prp^] < ^ r-^. 
Pi[Ins I iVg] Pr[Ins ] Nq] ~ I - H/S 

(the second equaUty follows from the fact that if the encoder outputs something not in D, then it 
must have not queried it, i.e.. Ins C Nq; the inequality follows from Lemma[2]). 
By Lemma [T] we have 



Pr[Low] <(j^] ■ (2) 



Now by combining ([I]), ([2]), and the fact that Pr[ Corr] < p by reliability, we get that 

Pv[Few]< — +P + 



I2w J ' ^ ' 1 _ H/S ■ 

Note that the probability is taken, in particular, over a random choice of D. Therefore, it holds 
for at least one flat /i-channel. 

Let random variable q be equal to the number of queries made by SE to encode m under K. 
Then, letting d = 12™ je and c = 1 — p — x-Hls ^ S^t 

Ar>0 Af=0 ^ ^ Ar=0 ^ ^ 

The expected number of queries per document sent is (E[(7])// and so is at least (\—p— -^_|^yg )(2"'/e). 

□ 



3.4 Lower Bound for Computationally Bounded Parties 

We now want to establish the same lower bound without making such a strong assumption about 
the security of the stegosystem. Namely, we do not want to assume that the insecurity e is low unless 
the adversary's description size and running time are feasible ("feasible," when made rigorous, will 
mean some fixed polynomial in the description size and running time of the stegoencode and in a 
security parameter for a function that is pseudorandom against the stegoencoder). Recall that our 
definitions allow the adversary to depend on the channel; thus, our goal is to construct channels 
that have short descriptions for the adversary but look like random flat /i-channels to the black-box 
stegoencoder. In other words, we wish to replace a random fiat /i-channel with a pseudorandom 
one. 
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We note that the channel is pseudorandom only in the sense that it has a short description, so as 
to allow the adversary to be computationally bounded. The min-entropy guarantee, however, can 
not be replaced with a "pseudo-guarantee": else the encoder is being lied to, and our lower bound 
is no longer meaningful. Thus, a simpleminded approach, such as using a pseudorandom predicate 
with bias H/S applied to each symbol and history length to determine whether the symbol is in 
the support of the channel, will not work here: because S is constant, eventually (for some history 
length) the channel will have lower than guaranteed min-entropy (moreover, we do not wish to 
assume that S is large in order to demonstrate that this is unlikely to happen; our lower bound 
should work for any alphabet). Rather, we need the pseudorandom implementation of the channel 
to be truthfuH in the sense of [8] , and so rely on the techniques developed therein. 

The result is the following theorem, which is similar to Theorem [H except for a small term 
introduced by pseudorandomness of the channel. 

Theorem 2. There exist polynomials pi,P2 and constants ci,C2 with the following property. Let 
ST{k) be a black-box stegosystem with security parameter k, description size 5, unreliability p, 
rate w, and running time r for the alphabet S = {0,1,..., 5" — 1}. Assume that there exists 
a pseudorandom function family J^{n) with insecurity InSec^^j^^(t, d, g). Then, for any positive 
integer H < S, there exists a channel C with min-entropy h = log2 H such that the probability that 
the encoder makes at most N queries to send a random message of length Iw is upper bounded by 

j^j +p + i?e + (i? + l)(lnSecPf„F)(pi(T,n),5 + ci,pi(T,n))+T2-") , 

and the expected number of queries per stegotext symbol is therefore at least 

'^(l-p-Re-{R + l) (inSecP^J) (p, (r, n) , <^ + ci , pi (r, n) ) + t2-") ) , 

where R = S/{S—H) and e is the insecurity the stegosystem ST on the channel C against adversaries 
running in time P2{n, log 5, n) of description size n + C2, making just one query of length Iw to SE 
or O (i.e., e = InSec||,(^^ (^(p2('^, log 5, /), n + C2, 1, /t(;)J. 

Proof. The main challenge lies in formulating the analogue of Lemma [2] under computational re- 
strictions. Lemma [2] and its use in Theorem [1] relied on: (i) the inability of the encoder to predict 
the behavior of the channel (because the channel is random) and (ii) the ability of the adversary 
to test if a given string is in the support of the channel (which the adversary has because it is un- 
bounded). We need to mimic this in the computationally bounded case. We do so by constructing 
a channel whose support (i) appears random to a bounded encoder, but (ii) has an efficient test of 
membership that the adversary can perform given only a short advice. As already mentioned, we 
wish to replace a random channel with a pseudorandom one and give the short pseudorandom seed 
to the adversary, while keeping the min-entropy guarantee truthful. 

The next few paragraphs will explain how this is done, using the techniques of huge random 
objects from [8j. A reader not familiar with [8j may find it easier to skip to the paragraph entitled 
"Properties of the Pseudorandom Flat-/i Channels," where the results of this — i.e., the properties 
of the channel that we obtain — are summarized. 



In this case, truthfulness imphes that for each history length, the support of the channel has exactly H elements. 
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Specifying and Implementing the Flat-/i Channel For the next few paragraphs, famiharity 
with [8j will be assumed. Recall that [8] requires a specification of the object that will be pseudo- 
randomly implemented, in the form of a Turing machine with a countably infinite random tape. 
It would be straightforward to specify the channel as a random object (random subset D of S of 
size H) admitting two types of queries: "sample" and "test membership." But a pseudorandom 
implementation of such an object would also replace random sampling with pseudorandom sam- 
pling, whereas in a stegosystem the encoder is guaranteed a truly random sample from D (indeed, 
without such a guarantee, the min-entropy guarantee is no longer meaningful). Therefore, we need 
to construct a slightly different random object, implement it pseudorandomly, and add random 
sampling on top of it. We specify the random object as follows. Recall that S = h is the 
min-entropy, and H = 2^. 

Definition 3 (Specification of a fiat /i-channel). Let be a probabilistic Turing machine with 
an infinite random tape uj. On input five integers {S, H,i,a,b), (where < H < S, i > 0, 
< a <b < S), does the following: 

• divides lo into consecutive substrings yi,y2, ■ ■ ■ of length S each; 

• identifies among them the substrings that have exactly H ones; let y be the ith such substring 
(with probability one, there are infinitely many such substrings, of course); 

• returns the number of ones in y between, and including, positions a and 6 in y (positions are 
counted from to 5 — 1). 

In what way does M = specify a fiat /i-channel? To see that, identify S with {0, . . . , 5 — 1}, 
and let Di be the subset of S indicated by the ones in y. Then Di has cardinality H and testing 
membership in Di can be realized using a single query to M: 

insupp^"^ a, s) : 

return M{S, H,i, s, s) 

Obviously, Di are selected uniformly at random and independently of each other. Thus, this object 
specifies the correct channel and allows membership testing. 

We now use this object to allow for random sampling of Di. Outputting a random element of 
Di can be realized via log S queries to M, using the following procedure (essentially, binary search): 

rndelt*^(i) : 

return random-element-in-range^(S', H, i,0,S — 1) 

randoin-element-in-range*^(S', H, i, a, h) : 
a a = b then return a and terminate 

mid ^ [{a + b)/2\ 
total ^ M{S, H,i,a,b) 
left ^ M{S, H, i, a, mid) 

r <^ {1, . . . , total} 
if r < left then 

random-element-in-range*^(S', H, i, a, mid) 
else 

random-element-in-range*-'(5, H, i, mid + 1, b) 
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We can implement this random object pseudorandomly using the same techniques as [8] uses 
for implementing random boolean functions with interval sums (see [8j Theorem 3.2]). Namely, 
the authors of |8j give a construction of a truthful pseudo-implementation of a random object 
determined by a random boolean function / : {0, . . . , 2" — 1} — > {0, 1} that accepts queries in 
the form of two n-bit integers {a,b) and answers with X]j=a I^o^sJ^lyi their construction 
is as follows. Let S = 2". Imagine a full binary tree of depth n, whose leaves contain values 
/(O), /(I), . . . , f{S — 1). Any other node in the tree contains the sum of leaves in its subtree. Given 
access to such tree, we can compute any sum f{a) + f{a + l) + ... + f{b) in time proportional 
to n. Moreover, such trees need not be stored fully but can be evaluated dynamically, from the 
root down to the leaves, as follows. The value in the root (i.e., the sum of all leaves) has binomial 
distribution and can be filled in pseudorandomly. Other nodes have more complex distributions 
but can be also filled in pseudorandomly and consistently, so that they contain the sums of their 
leaves. The construction uses a pseudorandom function to come up with the value at each node. 

We need to make three modifications. First, we simply fix the value in the root to H, so that 
/(O) + /(I) + . . . -|- f{S — 1) = H. Second, we allow S to be not a power of 2. Third, in order to 
create multiple distributions Di, we add i as an input to the pseudorandom function, thus getting 
different (and independent-looking) randomness for each Di. 

Having made these modifications, we obtain a truthful pseudo-implementation of M. It can be 
used within insupp and rndelt instead of M, for efficient membership testing and truly random 
sampling from our pseudorandom channel. 

Properties of the Pseudorandom Flat /i-Channels We thus obtain that, given a short 
random seed lo, it is possible to create a flat /i-channel that is indistinguishable from random and 
allows for efficient membership testing and truly random sampling given uj. To emphasize the 
pseudorandomness of the channel, in our notation we will use DPR insted of D and keep the seed 
UJ explicit as a supercript. Thus, DPR':!' ^ pseudorandom subset of E of size H, and the channel 

is denoted by = DPRf X DPR% X Similarly to E defined in Section 13.21 for truly random 

channels, define EPRn = {{uj,S) \ \uj\ = n,Sij G DPR'^}. 

Because has the requisite min-entropy, it is valid to expect proper performance of the 

stegoencoder on it; because it is pseudorandom, an analog of Lemma [2] will still hold; and because 
it has efficient membership testing given a short seed, the adversary will be able to see if an output 
of the stegoencoder is not from it. 

We are now ready to formally state the claim about the properties of For this claim, 

and for the rest of the proof, we assume existence of a family of pseudorandom functions with 
insecurity InSec^^^{t,d,q) (recall that InSec is a bound on the distinguishing advantage of any 
adversary running in time at most t of description size at most d making at most q queries). To 
simplify the notation, we will note that for us d always will be at most description size of the 
stegosystem plus some constant ci, and that q < t. We will then write LpRpin^t) instead of 
InSecPf„i;(t,(i,g). 

Claim 1. There is a polynomial p and a family of channels DPSt , indexed by a string uj of length 
n (as well as values H and S ), such that, for any positive integers n, i and H < S, channel DPA 
has the following properties: 

• is a flat h-channel for h = log H on the alphabet {0, . . . ,S — 1}; 

• allows for sampling and membership testing in time polynomial in n, log 5*, and \ogi given 
uJ,i,H, and S as inputs; 
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is pseudorandom in the following sense: for any H, S, and any oracle machine (distinguisher) 
A with running time r > log S, 



Pr [y^5,Mem6(D) Q = 1] _ J^5,Mem6(a;) q _ 

{D,S)^E {uj,S)*-EPR„ 



< LpB.Fin,p{T,n)) +r2" 



where Memb{D) and Memb{uj) denote membership testing oracles for D and DPA , respec- 
tively. 

The claim follows from the results of [8j with minor modifications, as presented above. We 
present no proof here. 

Note that the second argument to lprf depends on S only to the extent r does; this is important, 
because, even for large alphabets and high-entropy channels, we want to keep the second argument 
to LpRF as low a possible so that lprf is as low as possible. 



Stegosystems Running with DPR Almost Always Output Query Answers Having built 
pseudorandom channels, we now state the analog of Lemma [2] that works for stegosystems secure 
only against bounded adversaries. Fix some H and S. Let A be the same as in Lemma[2l but given 
access to DPlt instead of D, and let t = ti .. .ti be its output and Qi be the set of responses A 
received to its queries of the ith channel DPRi. Analogously to Nq and Ins, define the following 
two families of events, indexed by n, the security parameter for the PRF. 

• nonqueried, pseudorandom version: NqPR^ = {{ui,S) E EPRn \ (3i)tj ^ Qi} 

• in support, pseudorandom version: InsPRn = {{oJ,S) G EPRn \ t G DPlt} 

We show that high probability of InsPRn implies low probability of NqPRn- Formal statement 
of the lemma follows. To simplify the notation, let R = 5/(5 — H). 

Lemma 3. There exists a polynomial pi such that, for any A running in time r > log S, if 
FilInsPRn] < e(n), then 

Pr[NqPRj < Re{n) + (i? + l)(^PH^(n,pi(r, n)) + t2-") . 

Proof. Let Ins and Nq be the same as in Lemma [2J Let A' be a machine that is given an oracle 
which tests membership in the channel. Let A' run A to get t and output 1 if and only if the 
membership oracle says that t is in the channel. Applying Claim [T] to A', we have that for some 
polynomial p' (namely, the polynomial p{t + t^/(T),n), where tA' is the extra time that A' needs 
to run after A is finished), 

I Fi[InsPRn] - Pr[Ins]\ < LpRFin,p' {T,n)) + r2~". 

Therefore Pr[/ns] < e(n) + LppF{n-,p{T + p'{T,n))) + r2~". It now follows, by the same derivation 
as for Equation ([T|) in the proof of Theorem [H that 

e{n) + LPRF{n,p'{T, re)) + t2~" 
1 - H/S 

Let A" be a machine that runs A and outputs 1 if and only if A outputs something it did not 
receive as a query response. Applying Claim[T]to A" , we get that, for some polynomial p" (namely, 



FilNq] 
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the polynomial p{t + t^"(T),n), where tA" is the extra time that A" needs to run in addition to 
A), we get \ Fi[NqPRj - Fi[Nq]\ < LpRF{n,p" {T,n)) + tI"^ . Therefore, 

PWi^J < ^^"^ + + ^P.Hn,/(r, n)) + (1 + i^)r2-" . 

Now let pi > max(p',p"). □ 



Completing the Proof. We are now ready to prove Theorem [2j We define the same events as 
in the proof of Theorem [H except as subsets of EPRn x {0,1}* x {0,1}'"' x {0,1}* rather than 
^x{0,l}*x{0,l}'"'x{0,l}* (we use the suffix PR to emphasize that they are for the pseudorandom 
channel): FewPRn, CorrPRn, LowPRn denote, respectively, that SE made at most N queries, that 
SD correctly decoded the hiddentext, and that the hiddentext has a low- weight encoding. 

Just like in the proof of[Il it holds that Fv[FewPRn] < FT[LowPRn] + Pr [NqPRj + Pr [ CorrPRn] 
and that Fi\CorrPR^] < p and Fv[LowPRn] < {Ne/12'^)K It is left to argue a bound on Pr[iVgPi?„]. 

Consider an adversary against our stegosystem that contains lo as part of its description, gives its 

oracle a random message to encode, and then tests if the output is in . It can be implemented 

to run in p2{n^\ogS,l) steps for some polynomial p2 and has description size n + C2 for some 

constant C2- Hence, its probability of detecting a stegoencoder output that is not in cannot 

be more than the insecurity e = InSec^^ :^i^{p2{n^\ogS,l),n + C2,l,Zti'). In other words, 

st[k),DPR 

Pr [/nsPi?„] < e, and, by Lemma El we get 

Fi[NqPRn] <Re+iR+ l)iiPRFin,piir, n)) + t2-") . 

Finally, to compute a bound on the expected value, we apply the same method as in the proof 
of Theorem [TJ □ 



Discussion. The proof of Theorem [2] relies fundamentally on Theorem [TJ specifically. Lemma [3] 
relies on Lemma[2j In other words, to prove a lower bound in the computationally bounded setting, 
we use the corresponding lower bound in the information-theoretic setting. To do so, we replace an 
object of an exponentially large size (the channel) with one that can be succinctly described. This 
replacement substitutes some information-theoretic properties with their computational counter- 
parts. However, for a lower bound to remain "honest" (i.e., not restricted to uninteresting channels), 
some global properties must remain information-theoretic. This is where the truthfulness of huge 
random objects of [8j comes to the rescue. We hope that other interesting impossibility results 
can be proved in a similar fashion by adapting an information-theoretic result using the paradigm 
of [8]. We think truthfulness of the objects will be important in such adaptations for the same 
reason it was important here. 

Note that the gap in the capabilities of the adversary and encoder /decoder is different in the 
two settings: in the information-theoretic case, the adversary is given unrestricted computational 
power, while in the computationally bounded case, it is assumed to run in polynomial time but is 
given the secret channel seed. However, in the information-theoretic case, we may remove the gap 
altogether by providing both the adversary and the encoder /decoder with a channel membership 
oracle and still obtain a lower bound analogou^ to that of Theorem [2j We see no such opportunity 

lower bound on the number of samples per document sent becomes trivially zero if the encoder is given as 
much time as it pleases, in addition to the membership oracle of the flat channel. Yet it should not be difficult to 
prove that it must then run for 0(2"') steps per document sent. 
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to remove the gap in the computationally bounded case (e.g., equipping the encoder/decoder with 
the channel seed seems to break our proof). Removing this asymmetry in the computationally 
bounded case seems challenging and worth pursuing. 



4 The Stateful Construction STF 



The construction STF relies on a pseudorandom function family T . In addition to the security 
parameter k (the length of the PRF key K\ it depends on the rate parameter w. Because it is 
stateful, both encoder and decoder take a counter ctr as input. 

Our encoder is similar to the rejection-sampler-based encoder of [11] generalized to w bits: 
it simply samples elements from the channel until the pseudorandom function evaluated on the 
element produces the tf-bit symbol being encoded. The crucial difference of our construction is 
the following: to avoid introducing bias into the channel, if the same element is sampled twice, the 
encoder simply flips a random coin to decide whether to output that element with probability 2^*". 
Hopper |12[ Construction 6.10] independently proposes a similar construction, except instead of 
flipping a fresh random coin, the encoder evaluates the pseudorandom function on a new counter 
value (there is a separate counter associated to each sampled document, indicating how many times 
the document has been sampled), thus conserving randomness. 

Observe that, assuming T is truly random rather than pseudorandom, each sample from the 
channel has probability 2"^" of being output, independent of anything else, because each time 
fresh randomness is being used. Of course, this introduces unreliability, which is related to the 
probability of drawing the same element from Z)?^ twice. 



Procedure STF.5'E(Er, w, m, ctr): 
Let m = m\m2 . . .rai, where |mj| = 
for i ^ 1 to /: 

j ^ 0; / ^ 0; ctr ^ctr + \ 

repeat : 

J ^ J + 1 



w 



Procedure STY .SD{K,w, s, ctr) 
Let s = si . . . si, where Sj G S 
for i = \ to I 
ctr <— ctr + 1 
mi ^ Fxictr, Si) 
output m = 7711771-2 ■ ■ - mi 



Si 



M{n) 
if 3j' < j s.t. 



let cGR{0,ir 
if c = rrii then / 

else if Fxictr, Sij) 
then / ^ 1 
until / = 1 

Si < Sij^ l~t ^ T^ll^^ 
output S = SlS2 ■ ■ ■ Si 



- 1 

rrii 



Theorem 3. The stegosystem STF has insecurity InSeCg^pj-^ «,)(^) '^i ^ = ^^^^^J{k)(^ + 

d + 0{1),12"'). For each i, the probability that Si is decoded incorrectly is 2"^'^^ -|- InSec^^J^(2'", 

0(1), 2"'), and unreliability is at most 1(2-''+'^ + InSec^^^^{2''" ,0{l),2'")). 

Proof. Insecurity bound is apparent from the fact that if J- were truly random, then the system 
would be perfectly secure, because its output is distributed identically to C (simply because the 
encoder samples from the channel and independently at random decides which sample to output. 
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because the random function is never applied more than once to the same input). Hence, any 
adversary for the stegosystem would distinguish T from random. 

The reliability bound per symbol can be demonstrated as follows. Assuming that T is random, 
the probability that / becomes 1 after j iterations of the inner loop in STF.^-E (i.e., that Sj = Sjj) 
is (1 — . If that happens, the probability that 3j' < j such that Sjj = si^y is at most 

(j — . Summing up and using standard formulas for geometric series, we get 

oo 00/ / CO \ \ 

^(j _ \)2-^ (1 _ 2-^)^'"^ 2-"' = 2-^""' (1 - 2-"')-'' ^(1 - < 2""-^. 

j=l j=l \ \k=0 J J 

□ 

Note that errors are independent for each symbol, and hence error-correcting codes over alphabet 
of size 2"^ can be used to increase reliability: one simply encodes m before feeding it to SE. Observe 
that, for a truly random J^, if an error occurs in position i, the symbol decoded is uniformly 
distributed among all elements of {0, 1}"' — {m,}. Therefore, the stegosystem creates a 2"'-ary 
symmetric channel with error probability 2'"-^{l - 2"'^) = 2~''(2"' - 1) (this comes from more 
careful summation in the above proof). Its capacity is w — H[l — 2~^{2'^ — 1), 2~^ , 2~^, . . . , 2~^] 
(where H is Shannon entropy of a distribution) [161 p. 58]. This is equal to w + {2'^ — 1)2^^ log 2~^ + 
(1 - 2-^^(2^ - 1)) log(l - 2-^^(2"' - 1)). Assuming that the error probability 2-'^(2"' - 1) < 1/2 and 
using log(l — x) > —2x for < x < 1/2, we get that the capacity of the channel created by the 
encoder is at least w + 2~^{2'^ — l){—h — 2) >w — {h + 2)2"'*+'". Thus, as / grows, we can achieve 
rates close to w-{h + 2)2^^+"" with near perfect security and reliability (independent of h). 



4.1 Stateless Variants of STF 

Our stegosystem STF is stateful because we need F to take ctr as input to make sure we never apply 
the pseudorandom function more than once to the same input. This will happen automatically, 
without the need for cir, if the channel C has the following property: for any histories Ti and 
Ti' such that TL is the prefix of TL' , the supports of D-yi and D-)-(i do not intersect. For instance, 
when documents have monotonically increasing sequence numbers or timestamps, no shared state 
is needed. 

To remove the need for shared state for all channels, we can do the following. We remove 
ctr as an input to F and instead provide STF.S'-E with the set Q of all values received so far as 
answers from M. We replace the line "if 3j' < j s.t. Sjj- = Sj^/" with "if Sjj- € Q" and add the 
line "Q <— Q U {^ij}" before the end of the inner loop. Now shared state is no longer needed for 
security, because we again get fresh coins on each draw from the channel, even if it collides with a 
draw made for a previous hiddentext symbol. However, reliability suffers, because the larger / is, 
the more likely a collision will happen. A careful analysis, omitted here, shows that unreliability is 
l22-h+w (-pi^g ^j^g insecurity of the FRF). 

Unfortunately, this variant requires the encoder to store the set Q of all the symbols ever sampled 
from C. Thus, while it removes shared state, it requires a lot of private state. This storage can be 
reduced somewhat by use of Bloom filters [2] at the expense of introducing potential false collisions 
and thus further decreasing reliability. An analysis utilizing the bounds of [3] (omitted here) shows 
that using a Bloom filter with {h — w — log/)/ In 2 bits per entry will increase unreliability by only 
a factor of 2, while potentially reducing storage significantly (because the symbols of S require at 
least h bits to store and possibly more if the D^-i is sparse). 
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5 The Stateless Construction STL 



The stateless construction STL is simply STF without the counter and collision detection (and is a 
generalization to rate w of the construction that appeared in the extended abstract of [llj). Again, 
we emphasize that the novelty is not in the construction but in the analysis. The construction 
requires a reliability parameter k to make sure that expected running time of the encoder does not 
become infinite due a low-probability event of infinite running time. 

Procedure STL.SE{K, w, k, m, Ti): Procedure STL.SD{K, w, s): 
Let m = nil ■ ■ ■ nT-h where \mi\ = w Let s = si . . . si, where Sj G S 

for i <— 1 to for i = 1 to I 

j ^ rrii^ Fxisi) 

repeat : output m = mim2 ■ ■ ■ mi 

si,j ^ M{n) 

until Fxisij) =mi or j = k 
output S = S1S2 ■ ■ ■ Si 

Theorem 4. The stegosystem STL has insecurity 

InSec||L(«,^,fc),c(^' d, h Iw) G 0(2-^+2«;;2 ^ ;g-/=/2-) ^ inSec™J)(t + 0(1), d + 0(1), 12"") . 
More precisely, 

2-^ + 1)22«' - l{l + 3)2"' + 21) +2l(^-^ + InSec^^J)(t + 1, d + 0(1), 12""). 



Proof. The proof of Theorem U] consists of a hybrid argument. The first step in the hybrid argument 
is to replace the stegoencoder SE with which is the same as SE, except that it uses a 

truly random G instead of pseudorandom which accounts for the term InSec^^^(t + 0(1), d + 
0(1), Z2"'). Then, rather than consider directly the statistical difference between C and the output 
of SEi on an lw-h\t message, we bound it via a series of steps involving related stegoencoders (these 
are not encoders in the sense defined in Section [21 as they do not have corresponding decoders; 
they are simply related procedures that help in the proof). 

The encoders SE2, SE3, and SE4 are specified in Figured! SE2 is the same as SEi, except that 
it maintains a set Q of all answers received from M so far. After receiving an answer Sij <— M{Ti.), 
it checks if Sij £ Q; if so, it aborts and outputs "Fail"; else, it adds Sij to Q. It also aborts 
and outputs "Fail" if j ever reaches k during an execution of the inner loop. SE3 is the same as 
SE2, except that instead of thinking of random function G as being fixed before hand, it creates 
G "on the fly" by repeatedly flipping coins to decide the w-bit value assigned to Sij. Since, like 
SE2, it aborts whenever a collision between strings of covertexts occurs, the function will remain 
consistent. Finally, ^£'4 is the same as SE3, except that it never aborts with failure. 

In a sequence of lemmas, we bound the statistical difference between the outputs of ^^^i and 
SE2', show that it is the same as the statistical difference between the outputs of SE^ and SE4^; 
and show that the outputs of SE2 and SE3 are distributed identically. Finally, observe that SE4 
does nothing more than sample from the channel and then randomly and obliviously to the sample 
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SE2 {K, w,k,mi . . .mi,'H): 
for i 1 to I: 

repeat : 

i ^ i + 1 

^ M{H) 
if Si J €Qorj — k + l then 

abort and output "Fail" 
Q ^ gu {s,j} 

until G{si j) — rrii 
Si Si J-; H <- 7i||si 

output S — S1S2 ■ ■ ■ Si 



SE^[K, w, k, mi . . . m/, 7i): 
for i ^ 1 to Z: 

repeat : 

j ^ j + 1 

if Si J- e Q or j = fc + 1 then 

abort and output "Fail" 
O^QU{s,,j} 
Pick c Gfl {0, 1}™ 
until c = rrii 

Si ^ Si^j^ 'hL ^ 7Y||Si 
output S S1S2 . . . S; 



SEi{K, w, k, mi . . .mijTi.): 
for i ^ 1 to 

repeat : 

J ^ J + 1 
s,,, ^ M{H) 



Pick c Gfl {0, 1}- 
until c = rrii 

Si ^ Si i , ^ 7Y Si 



output S = S1S2 . 



■ Si 



Figure 1: "Encoders" 5£'2, S'-E's, and SE4 used in the proof of Theorem H] 

keep or discard it. Hence, its output is distributed identically to the channel. The details of the 
proof follow. 

For ease of notation, we will denote (the upper bound on the probability of elements of 
Dn) by p and 2"^ by R for the rest of this proof. 

The following proposition serves as a warm-up for the proof of Lemma HJ which follows it. 

Proposition 1. The statistical difference between the output distributions of SEi and SE2 for a 
w-bit hiddentext message m G {0, l}*" is at most 2p/{R — 1)^ + 2e~^f^. That is, 



Pr [SEi(K,w,k,m,n) s] - Pr [SE2(K,w, k,m,n) s] 

G,M G,M 



< 2p{R - if + 2e-^'^ . 



Proof. Consider the probability that SE2 outputs "Fail" while trying to encode some m G {0, 1}"'. 
This happens for one of two reasons. First, if after k attempts to find Sij such that G{sij) = mi, 
no such Sij has been drawn. Second, if the same value is returned twice by M before SE2 finds 
a satisfactory Sij; in other words, if there has been a collision between two unsuccessful covertext 
documents. 

Let El denote the event that one of these two situations has occurred and ni denote the value 
of j at which Ei occurs. Then 

R-lV /R-lV„ fR-l\^'^,, /R-1^'' 



Pr[Ei] < {—^] P+{^^] 2p+---+{——\ {k-2)p + 



R J \ ^ J \ R J \ R 

p E -IT (™i - 1) + 



ni=2 



R J ' ' \ R 

< p(^) E (-1 + 1) + 



R J — \ R J \ R 

ni=0 



piR-lf + 



R 

< p{R-lf + e-^'^ . 
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Observe that the probabihty that SE2 outputs a specific document s which is not "Fail" can 
be only less than the probability that SEi outputs the same element. Since the total decrease over 
all such s is at most the probability of failure from above, the total statistical difference is at most 
2Pi[Ei]. □ 

Lemma 4. The statistical difference between the output of SEi and SE2 when encoding a message 
m G {0, 1}'"' is at most 



p{l{l + l)R^ -l{l + 3)R + 21) +21(^1- ^ . 



Proof. Proposition [T] deals with the case 1 = 1. It remains to extend this line of analysis to the 
general case / > 1. As in the proof of Proposition [H let Ei denote the event that SE2 outputs 
"Fail" while attempting to encode the ith block of mj. Note that Ei grows with i because the set 
Q grows as more and more blocks are encoded. Also, let Ui denote the number of attempts used 
by SE2 to encode the ith block. To simplify the analysis, we initially ignore the boundary case of 
failure on attempt Ui = k and treat a failure on this attempt like all others. Let E'^ denote these 
events. Then, we have the following sequence of probabilities. 
Recall that, for E[, 

Pv[E[] <p{R-lf. 

In the harder case of E2, 



Pr[£^2] = Pi'[-£'2l'^i draws for bit 1] Pr[ni draws for bit 1] 

ni=l 

ni=l n2=l 

= IE — (E E — 

ni=l ^ ^ \n2=l n2=l 

ni— 1 



< |E (^) {Pr[E[]/p + m{R-l)) 

rii=l ^ ^ 



< j^{RPr[E[]/p +R\R-1)) 

= p {{R - 1)^ + RiR - 1)) 
= p{2R - 1){R - I) . 

Similarly, for E'g, 

k k ^ / D 1 \ ni+n2+n3-2 



Pr[i^3] < E E (V) (ni+n2 + n3-l) 

rii=l n2=l n3=l ^ ^ 

^^E(^r(«..E(^rE(^) 

ni=l \ n2=l ".3=1 

k / jj \ m — 1 

< :^E(-?rJ {R^AE'2]/p + n,R{R-l)) 
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< ^{R'PTlE'.yp + R^R-l)) 
= pIsR - 1){R - 1) . 



In general, for E'^, we have the recurrence 

k 



1^ / r, 1 \ 711 — i 

^AK\ < ^El^j {R'-'Pr:[E',]/p + mR^-\R-l)) 

ni=l ^ ' 

< PT[Ei_,]+pR{R-l), 
which when solved yields 

PT[E'i\ < p{iR - 1){R - 1) . 
Now summing up the probability of failure for each of the u;-bit blocks of hiddentext gives 



1=1 i=l 

= p{R-l)(RY;^^-Y,^] 

\ i=l i=l ) 



= J'((T)'' + '^'"(f)<' + ^'' + ') ■ 

Next, we compute the probability of the event that the encoding of block mj fails because there 
were k unsuccessful attempts to find a string of n covertexts which evaluates to under G, given 
that no collisions occurred so far. Call this event Ei. Then 



Pr[£^i] < 



R 



Finally, we compute the total probability of failure which is at most the sum of the E[ and Ei 
events. That is, the probability that SE^ outputs "Fail" while encoding any of the I lu-bit blocks 
of mj of m is at most 



< J2^T[Ei] + PT[E,] 



i=l i=l 



R'^\ ( R\ „^, A J R-\^^ 



2 J ' ' \2 J ' ' ) \ R 

The statistical difference is at most just twice this amount. □ 

Lemma 5. The statistical difference between the output distributions of SE2 and SE^ for a random 
function G and hiddentext message m G {0, 1}'"' is zero. 
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Proof. Both SE2 and SE^ abort and output "Fail" whenever the encoding a block rrii fails. This 
occurs because either: (1) there are k unsuccessful attempts to find Sij such that G{sij) = rrii] 
or (2) the same document is drawn twice, i.e., there is a collision between candidate covertext 
documents. Hence, SE2 evaluates G at most once on each element of S. So, although SE3 ignores 
G and creates its own random function by flipping coins at each evaluation, since no element of T, 
will be re-assigned a new value, the output distributions of SE2 and SE^ are identical. □ 

Lemma 6. The statistical difference between the output distributions of SE^ and SE4 is equal to 
the statistical difference between the output distributions of SEi and SE2 used to encode the same 
message. 

Proof. As Lemma m shows, the probability that SE2 (and consequently SE^ by Lemma [5]) outputs 
"Fail" is at most 



f) + (f) ('.3),.,) 

Note that 5'i?4 has no such element; the probabilities of each output other that "Fail" can only 
increase. Hence, the total statistical difference is twice the probability of "Fail." □ 

These three Lemmas, put together, conclude the proof of the Theorem. We can save a factor 
of two in the statistical difference by the following observation. Half of the statistical difference 
between the outputs of SEi and SE2, as well as between the outputs of £'£'3 and SE4, is due to the 
probability of "Fail". Because neither SEi nor SE4 output "Fail," the statistical difference between 
the distributions they produce is therefore only half of the sum of the statistical differences. □ 

Theorem 5. The stegosystem STL has unreliability 



UnRel||L(K,«,,fc),c,/ ^ ^ 



2'^exp 



+ exp [-2'""-^^ ) + InSec^F. (t, d, 12 



where t and d are the expected running time and description size, respectively, of the stegoencoder 
and the stegodecoder combined. 

Proof. As usual, we consider unreliability if the encoder is using a truly random G; then, for a 
pseudorandom F, the encoder and decoder will act as a distinguisher for F (because whether 
something was encoded correctly can be easily tested by the decoder), which accounts for the 
InSec-'^^-^ term. 

The stegoencoder fails to encode properly when it cannot find Sij such that G{sij) = nii after 
k attempts. We will consider separately the case where G is simply unlikely to hit nii and where 
G is reasonably likely to hit mj, but the samples from the channel are just unlucky for k times in 
a row. 

To bound the probability of failure in the first case, fix some channel history 7i and w-hit message 
m and consider the probability over G that G^D-^) is so skewed that the weig ht of G-i(m) in Dn 
is less c2~"' for some constant c < 1 (note that the expected weight is 2~"'). Formally, consider 
PrGpr^^i:)^ [G(s) = m] < c2~'^]. Let S = {si . . . s„} be the alphabet, and let Pr£)^[si] = pi. 
Define the random variable Xi as = if G{si) = m and Xi = pi otherwise. Then the weight 
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of G ^(m) equals Vis^j:,^[G{s) = m] = 1 — X^ILi"^*- Note that the expected value, over G, of 
Yll=i^i is 1 — 2""^. Using Hoeffding's inequality (Theorem 2 of pjQj), we obtain 



Pr[l - ^ X, < c2-"'] < exp 

< exp 



i=l 



-2{l -of 2-^- lY^p^ 



1=1 



-2(1 - c?2'^'" 



i=l 



exp 



-2(1 - c?2^-^'" 



where the second to last step follows from pi < 2^^ and the last step follows from Y^^'^iPi = 1. 
If we now set c = 1/2 and take the union bound over all messages m € {0, l}'^, we get that the 
probability that G is skewed for at least one message is at most 2*" exp |^_2'*~2w-ij ^ 

To bound the probability of failure in the second case, assume that G{D-}i) is not so skewed. 
Then the probability of failure is 

(1 - c2-"')'= < exp [-c2-"'A:] . 
The result follows from setting c = 1/2 and taking the union bound over I. □ 
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A On Using Public e-Biased Functions 

Many stegosystems |1H [T8[ [1] (particularly public-key ones) use the following approach: they en- 
crypt the hiddentext using encryption that is indistinguishable from random and then use rejection 
sampling with a public function / : S ^ {0, 1}"" to stegoencode the resulting ciphertext. 
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For security, / should have small bias on Z^t^: i.e., for every c G {0,1}"', PrseD-^l* £ 
should be close to 2""". It is commonly suggested that a universal hash function with a published 
seed (e.g., as part of the public key) be used for /. 

Assume that the stegosystem has to work with a memoryless channel C, i.e., one for which the 
distribution D is the same regardless of history. Let E be the distribution induced on S by the 
following process: choose a random c G {0, l}"" and then keep choosing s £ D until /(s) = c. Note 
that the statistical difference between D and E is exactly the bias e of /. We are interested in the 
statistical difference between D'' and E^. 

For a universal hash function / that maps a distribution of min-entropy h to {0, l}"", the bias is 
roughly e = 2(~'^+"')/2. As shown in [17], if ^ < 1/e (which is reasonable to assume here), statistical 
difference between and E^ is roughly at least y/le. 

Hence, the approach based on public hash functions results in statistical insecurity of about 
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